Structural analysis of a chromatin model¶
Structural models generated in previous tutorials, may also be analyzed in a more deeper way. In this tutorial some classical methods used by structural biologists, and included in TADbit are described.
This methods are mainly designed to be used on single models, we may thus start this tutorial by loading a single structural model:
from pytadbit.imp.impmodel import load_impmodel_from_cmm, load_impmodel_from_xyz
model = load_impmodel_from_cmm('./model.117.cmm') # model 117 corresponds to the centromere model of cluster 1
General shape of the model¶
In this section are described some methods to grossly describe the three-dimensional occupancy of a model.
Finding the center of mass and radius of gyration¶
The first method that allows to quickly understand how dense or compact is a model consists in the calculation of its radius of gyration (see pytadbit.imp.impmodel.IMPmodel.radius_of_gyration()) and its center of mass:
print model.center_of_mass()
print model.radius_of_gyration()
{'y': 0.001468639622746898, 'x': 0.001201006496139238, 'z': 4.508469448122167e-05}
3183.88015451
As an extra feature, radius of gyration (or gyradius) can also be seen within chimera:
model.view_model(tool='chimera_nogui', savefig='/tmp/image_model_2.png',centroid=True, gyradius=True)
model.view_model(tool='chimera_nogui', centroid=True, gyradius=True, savefig='/tmp/image_model_2.webm')
Model length¶
print model.contour()
96444.7657895
The length of the chromatin strand modeled is thus 100002 nm long.
Fitting model into a cube¶
Find the longest and shortest distance between 2 particles:
print model.longest_axe()
print model.shortest_axe()
10434.4715198 528.597592881
Characterize a cube that includes the model:
print model.cube_side()
print model.cube_volume()
12927.7275225 2.16056118551e+12
Chromatin accessibility: fitting objects inside the model¶
In order to infer which part of the modeled chromatin can be accessed by an object, like the transcription machinery, TADbit calculates a mesh around the model, and checks for each point of this mesh if an object of a given size can fit.
Here an example revealing the surface of a chromatin strand accessible to a hypothetical protein of 1000 nanometers (radius of 500 nanometers):
acc_dots, tot_dots, acc_area, tot_area, acc_vs_inacc = model.accessible_surface(
500, superradius=1500, nump=100, verbose=True, write_cmm_file='./model_mesh.cmm',
savefig='/tmp/model_mesh.webm', chimera_bin='chimera_nogui')
Accessible surface: 133.61 micrometers^2(4253 accessible times 0.0314159265359 micrometers) (4253 accessible dots of 6736 total times 0.03142 micrometers) - 63.14% of the contour mesh - 43.65% of a virtual straight chromatin (306.13 microm^2)
The function bellows gives an important amount of information.
- The text printed (when verbose=True), corresponds to some general statistics about the accessibility of the chromatin.
- In this example 81% of the chromatin is accessible by the hypothetical protein. This number does not only includes particles, but also the edges linking the particles (remember that a particle is a representation of a given locus of DNA). The second percentage printed corresponds to the percentage of accessible chromatin without taking into consideration its folding (or considering a straight strand of chromatin).
- As stated above, in order to infer the proportion of accessible chromatin, a mesh is drawn around the chromatin strand. This mesh represents all possible position of the hypothetical protein. Information about surface are relative to this mesh, not to the real accessible surface of the chromatin. However the number are proportional, and the percentages conserved.
- The dots also mentioned in the output are the representation of the mesh, their number is proportional to the nump parameter. The accessibility is measures using this dots, if a dot is distant enough from any point of the chromatin strand, than it is considered as accessible; while if some part of the chromatin lies closer than the radius of the hypothetical protein to one dot, this dot is considered inaccessible as this protein could not fit in its place. See the movie below (generated using the savefig parameter) for a better understanding, dots are displayed in green when they represented possible placement of the hypothetical protein, or in red when the protein would not fit.
In order to measure how “buried” are each particles, the functions returns a list of values (that, in the example above, we store under the acc_vs_inacc variable). This list contains, for each particle, the number of “green dots” and the number of “red dots”. A useful value that is the buried percentage of each particle (100% mean that the particle is completely inaccessible for the given protein).
Following with the example, these number can be obtained using the acc_vs_inacc list:
for i, acc, ina in acc_vs_inacc:
if int(i) != i: continue
prop = float(ina)/(acc+ina)*100 if ina+acc else float('NaN')
print 'particle %3s: %4.1f%% buried' % (int(i), prop),
print ('|' if i%4 else '\n'),
particle 1: 0.0% buried | particle 2: 5.3% buried | particle 3: 0.0% buried | particle 4: 0.0% buried particle 5: 3.4% buried | particle 6: 56.8% buried | particle 7: 3.6% buried | particle 8: 13.3% buried particle 9: 6.1% buried | particle 10: 14.8% buried | particle 11: 41.9% buried | particle 12: 4.0% buried particle 13: 21.9% buried | particle 14: 10.0% buried | particle 15: 46.4% buried | particle 16: 2.4% buried particle 17: 27.9% buried | particle 18: 26.5% buried | particle 19: 27.5% buried | particle 20: 12.9% buried particle 21: 84.2% buried | particle 22: 2.9% buried | particle 23: 0.0% buried | particle 24: 28.6% buried particle 25: 18.6% buried | particle 26: 32.1% buried | particle 27: 14.3% buried | particle 28: 14.3% buried particle 29: 37.0% buried | particle 30: 2.3% buried | particle 31: 13.9% buried | particle 32: 19.1% buried particle 33: 17.1% buried | particle 34: 31.7% buried | particle 35: 14.0% buried | particle 36: 28.6% buried particle 37: 69.8% buried | particle 38: 0.0% buried | particle 39: 25.0% buried | particle 40: 81.0% buried particle 41: 20.0% buried | particle 42: 20.0% buried | particle 43: 12.1% buried | particle 44: 50.0% buried particle 45: 46.8% buried | particle 46: 24.3% buried | particle 47: 68.2% buried | particle 48: 23.5% buried particle 49: 15.4% buried | particle 50: 17.6% buried | particle 51: 46.8% buried | particle 52: 91.3% buried particle 53: 22.7% buried | particle 54: 48.8% buried | particle 55: 35.1% buried | particle 56: 53.1% buried particle 57: 66.0% buried | particle 58: 72.4% buried | particle 59: 37.8% buried | particle 60: 24.4% buried particle 61: 61.1% buried | particle 62: 7.7% buried | particle 63: 40.5% buried | particle 64: 59.6% buried particle 65: 25.0% buried | particle 66: 24.4% buried | particle 67: 23.3% buried | particle 68: 31.0% buried particle 69: 29.7% buried | particle 70: 9.8% buried | particle 71: 31.8% buried | particle 72: 70.6% buried particle 73: 7.4% buried | particle 74: 11.4% buried | particle 75: 27.9% buried | particle 76: 27.7% buried particle 77: 30.2% buried | particle 78: 8.7% buried | particle 79: 8.0% buried | particle 80: 40.0% buried particle 81: 15.8% buried | particle 82: 42.3% buried | particle 83: 2.2% buried | particle 84: 13.0% buried particle 85: 5.3% buried | particle 86: 26.7% buried | particle 87: 17.1% buried | particle 88: 5.6% buried particle 89: 27.5% buried | particle 90: 51.1% buried | particle 91: 11.1% buried | particle 92: 13.6% buried particle 93: 2.9% buried | particle 94: 17.5% buried | particle 95: 6.8% buried | particle 96: 0.0% buried particle 97: 7.1% buried | particle 98: 0.0% buried | particle 99: 0.0% buried | particle 100: 0.0% buried particle 101: 0.0% buried |
Note that in this example no particle is 100% buried.
In order to visualize what really mean this result, the mesh can be displayed only around particles, setting the option include_edges to False. In this case, global value of accessibility of the chromatin will change, but the individual statistics of particles will be kept.
In the movie above, are shown this time only the relevant part of the mesh for each particle. Note that only a part of the sphere surrounding particles is displayed, as nearby edges are impeding the protein to come by the given particle. For more details on how the mesh is build refer to the function documentation: pytadbit.imp.impmodel.IMPmodel.accessible_surface()
acc_dots, tot_dots, acc_area, tot_area, acc_vs_inacc = model.accessible_surface(
500, nump=500, superradius=1500, verbose=True, include_edges=False, write_cmm_file='./model_partmesh.cmm',
savefig='/tmp/model_partmesh.webm', chimera_bin='chimera_nogui')
Accessible surface: 67.92 micrometers^2(10810 accessible times 0.00628318530718 micrometers) (10810 accessible dots of 14122 total times 0.00628 micrometers) - 76.55% of the contour mesh - 22.19% of a virtual straight chromatin (306.13 microm^2)